29 research outputs found
Self-Supervised Shape and Appearance Modeling via Neural Differentiable Graphics
Inferring 3D shape and appearance from natural images is a fundamental challenge in computer vision. Despite recent progress using deep learning methods, a key limitation is the availability of annotated training data, as acquisition is often very challenging and expensive, especially at a large scale. This thesis proposes to incorporate physical priors into neural networks that allow for self-supervised learning.
As a result, easy-to-access unlabeled data can be used for model training. In particular, novel algorithms in the context of 3D reconstruction and texture/material synthesis are introduced, where only image data is available as supervisory signal.
First, a method that learns to reason about 3D shape and appearance solely from unstructured 2D images, achieved via differentiable rendering in an adversarial fashion, is proposed.
As shown next, learning from videos significantly improves 3D reconstruction quality. To this end, a novel ray-conditioned warp embedding is proposed that aggregates pixel-wise features from multiple source images.
Addressing the challenging task of disentangling shape and appearance, first a method that enables 3D texture synthesis independent of shape or resolution is presented. For this purpose, 3D noise fields of different scales are transformed into stationary textures. The method is able to produce 3D textures, despite only requiring 2D textures for training.
Lastly, the surface characteristics of textures under different illumination conditions are modeled in the form of material parameters. Therefore, a self-supervised approach is proposed that has no access to material parameters but only flash images. Similar to the previous method, random noise fields are reshaped to material parameters, which are conditioned to replicate the visual appearance of the input under matching light
Single-image Tomography: 3D Volumes from 2D Cranial X-Rays
As many different 3D volumes could produce the same 2D x-ray image, inverting
this process is challenging. We show that recent deep learning-based
convolutional neural networks can solve this task. As the main challenge in
learning is the sheer amount of data created when extending the 2D image into a
3D volume, we suggest firstly to learn a coarse, fixed-resolution volume which
is then fused in a second step with the input x-ray into a high-resolution
volume. To train and validate our approach we introduce a new dataset that
comprises of close to half a million computer-simulated 2D x-ray images of 3D
volumes scanned from 175 mammalian species. Applications of our approach
include stereoscopic rendering of legacy x-ray images, re-rendering of x-rays
including changes of illumination, view pose or geometry. Our evaluation
includes comparison to previous tomography work, previous learning methods
using our data, a user study and application to a set of real x-rays
Learning a Neural 3D Texture Space from 2D Exemplars
We propose a generative model of 2D and 3D natural textures with diversity,
visual fidelity and at high computational efficiency. This is enabled by a
family of methods that extend ideas from classic stochastic procedural
texturing (Perlin noise) to learned, deep, non-linearities. The key idea is a
hard-coded, tunable and differentiable step that feeds multiple transformed
random 2D or 3D fields into an MLP that can be sampled over infinite domains.
Our model encodes all exemplars from a diverse set of textures without a need
to be re-trained for each exemplar. Applications include texture interpolation,
and learning 3D textures from 2D exemplars
CamP: Camera Preconditioning for Neural Radiance Fields
Neural Radiance Fields (NeRF) can be optimized to obtain high-fidelity 3D
scene reconstructions of objects and large-scale scenes. However, NeRFs require
accurate camera parameters as input -- inaccurate camera parameters result in
blurry renderings. Extrinsic and intrinsic camera parameters are usually
estimated using Structure-from-Motion (SfM) methods as a pre-processing step to
NeRF, but these techniques rarely yield perfect estimates. Thus, prior works
have proposed jointly optimizing camera parameters alongside a NeRF, but these
methods are prone to local minima in challenging settings. In this work, we
analyze how different camera parameterizations affect this joint optimization
problem, and observe that standard parameterizations exhibit large differences
in magnitude with respect to small perturbations, which can lead to an
ill-conditioned optimization problem. We propose using a proxy problem to
compute a whitening transform that eliminates the correlation between camera
parameters and normalizes their effects, and we propose to use this transform
as a preconditioner for the camera parameters during joint optimization. Our
preconditioned camera optimization significantly improves reconstruction
quality on scenes from the Mip-NeRF 360 dataset: we reduce error rates (RMSE)
by 67% compared to state-of-the-art NeRF approaches that do not optimize for
cameras like Zip-NeRF, and by 29% relative to state-of-the-art joint
optimization approaches using the camera parameterization of SCNeRF. Our
approach is easy to implement, does not significantly increase runtime, can be
applied to a wide variety of camera parameterizations, and can
straightforwardly be incorporated into other NeRF-like models.Comment: SIGGRAPH Asia 2023, Project page: https://camp-nerf.github.i
D32.1: Individual Use Cases and Test Scenarios Definition
ecoDriver targets a 20% reduction of CO2 emissions and fuel consumption in road transport by encouraging the adoption of green driving behaviour. Drivers will receive eco-driving recommendations and feedback adapted to them and to their vehicle characteristics. A range of driving profiles, powertrain
Femtosecond Transfer and Manipulation of Persistent Hot-Trion Coherence in a Single CdSe/ZnSe Quantum Dot
Ultrafast transmission changes around the fundamental trion resonance are
studied after exciting a p-shell exciton in a negatively charged II-VI quantum
dot. The biexcitonic induced absorption reveals quantum beats between hot trion
states at 133 GHz. While interband dephasing is dominated by relaxation of the
P-shell hole within 390 fs, trionic coherence remains stored in the spin system
for 85 ps due to Pauli blocking of the triplet electron. The complex
spectro-temporal evolution of transmission is explained analytically by solving
the Maxwell-Liouville equations. Pump and probe polarizations provide full
control over amplitude and phase of the quantum beats
Generative Modelling of BRDF Textures from Flash Images
We learn a latent space for easy capture, semantic editing, consistent
interpolation, and efficient reproduction of visual material appearance. When
users provide a photo of a stationary natural material captured under flash
light illumination, it is converted in milliseconds into a latent material
code. In a second step, conditioned on the material code, our method, again in
milliseconds, produces an infinite and diverse spatial field of BRDF model
parameters (diffuse albedo, specular albedo, roughness, normals) that allows
rendering in complex scenes and illuminations, matching the appearance of the
input picture. Technically, we jointly embed all flash images into a latent
space using a convolutional encoder, and -- conditioned on these latent codes
-- convert random spatial fields into fields of BRDF parameters using a
convolutional neural network (CNN). We condition these BRDF parameters to match
the visual characteristics (statistics and spectra of visual features) of the
input under matching light. A user study confirms that the semantics of the
latent material space agree with user expectations and compares our approach
favorably to previous work
Unsupervised learning of 3D object categories from videos in the wild
Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D. While several recent works have obtained analogous results using synthetic data or assuming the availability of 2D primitives such as keypoints, we are interested in working with challenging real data and with no manual annotations. We thus focus on learning a model from multiple views of a large collection of object instances. We contribute with a new large dataset of object centric videos suitable for training and benchmarking this class of models. We show that existing techniques leveraging meshes, voxels, or implicit surfaces, which work well for reconstructing isolated objects, fail on this challenging data. Finally, we propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction while obtaining a detailed implicit representation of the object surface and texture, also compensating for the noise in the initial SfM reconstruction that bootstrapped the learning process. Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks and on our novel dataset. For additional material please visit: https://henzler. github.io/publication/unsupervised_videos/
A quantitative CT parameter for the assessment of pulmonary oedema in patients with acute respiratory distress syndrome.
ObjectivesThe aim of this study was to establish quantitative CT (qCT) parameters for pathophysiological understanding and clinical use in patients with acute respiratory distress syndrome (ARDS). The most promising parameter is introduced.Materials and methods28 intubated patients with ARDS obtained a conventional CT scan in end-expiratory breathhold within the first 48 hours after admission to intensive care unit (ICU). Following manual segmentation, 137 volume- and lung weight-associated qCT parameters were correlated with 71 clinical parameters such as blood gases, applied ventilation pressures, pulse contour cardiac output measurements and established status and prognosis scores (SOFA, SAPS II).ResultsOf all examined qCT parameters, excess lung weight (ELW), i.e. the difference between a patient's current lung weight and the virtual lung weight of a healthy person at the same height, displayed the most significant results. ELW correlated significantly with the amount of inflated lung tissue [%] (pConclusionsELW could serve as a non-invasive method to quantify the amount of pulmonary oedema. It might serve as an early radiological marker of severity in patients with ARDS